-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[2/3 sled-agent] move some common types into their own crate #6122
[2/3 sled-agent] move some common types into their own crate #6122
Conversation
Created using spr 1.3.6-beta.1 [skip ci]
Created using spr 1.3.6-beta.1
Created using spr 1.3.6-beta.1
@@ -0,0 +1,606 @@ | |||
// This Source Code Form is subject to the terms of the Mozilla Public |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Full-admission, I'm sorta skimming over this rather than reading line-by-line because it looks mostly copied
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's fine since it's almost entirely code movement, there are no functional changes so it's more just making sure the code organization looks fine. Thanks!
@@ -1,14 +1,14 @@ | |||
use crate::helpers::generate_name; | |||
use anyhow::{anyhow, Context as _, Result}; | |||
use chrono::Utc; | |||
use omicron_sled_agent::rack_setup::config::SetupServiceConfig; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call to stop using this alias
Created using spr 1.3.6-beta.1
…cy graph (#6138) Reorganize some of our data structures and simplify the dependency graph. In particular, there's now just one copy of `OmicronZonesConfig` and related types. This is _mostly_ code reorganization, but there are a couple of interesting bits in here. Scroll down to the Questions section for more. ## Motivation ### 1. The necessity of sled-agent-types The primary driver behind this change was turning sled-agent's API into a trait -- breaking the sled-agent/nexus dependency cycle was one of the main motivations for that work. To recall, we currently have this logical cycle in the graph ([reference](https://rfd.shared.oxide.computer/rfd/0479#circular_deps)): ![Screenshot from 2024-07-21 21-03-02](https://github.com/user-attachments/assets/9227d036-ab37-4b1e-b4ae-e390a84e829b) With the work, we're attempting to move somewhere closer to this graph: ![Screenshot from 2024-07-21 21-07-30](https://github.com/user-attachments/assets/4d93e98e-cfdc-46dc-9244-7fca8b72d5d1) But to make that happen, some common types need to be moved into a `sled-agent-types` crate, so that the upcoming `sled-agent-api` crate can depend on them. Why not put them in `sled-agent-api` itself? Because: * In `sled-agent-client`, we often end up normalizing to a shared type -- so we either want to implement `From` conversions between these types and `sled-agent-client` types, or to `replace` them outright. So one of them has to depend on the other. * If there's even one type that we want to `replace`, the dependency direction must be `sled-agent-client` -> `sled-agent-api`. * But the client depending on the API trait is exactly the kind of circular dependency we're trying to avoid. * Another reason for trying to avoid a circular dependency here is that it has the potential to leave types "floating in the ether", just circulating via the schema with no definition in Rust. (This actually is a risk today with the way dependencies are set up!) So the only feasible alternative is to put these types in a shared crate. This crate is `sled-agent-types`, which we recently introduced in #6122. ### 2. sled-agent-types <-> nexus-types For types shared between Nexus and sled-agent, where should we put them? The options are: * **omicron-common**: this is where we've been putting these types in the past, and is not a terrible place to put them. However, if those types pull in extra dependencies it becomes a bit more problematic, because omicron-common is depended on by several workspaces outside omicron. * A secondary option would be putting these types behind an optional feature. But that would cause a lot of unnecessary rebuilds of omicron-common and all of its dependents (essentially splitting the build unit universe into two), while the option we eventually chose does not. * **sled-agent-types** or **nexus-types**: we could put them in one or the other, but we'd have to introduce a dependency between them at some point, and they're both logically peers -- neither direction seems like the right one. * **nexus-config**: This depends on `tokio-postgres` (and was actually split out of omicron-common for that reason). It wouldn't be great for nexus-client or sled-agent-client to transitively pull in tokio-postgres. * **a new crate**: This is the option we've chosen. It does add the complexity of another crate, but resolves all of the previous points. We're calling this **nexus-sled-agent-shared** to signal that it sits on top of omicron-common and below nexus-types and sled-agent-types. Something to consider is whether we want to have two clients in omicron, one for internal consumers with more dependencies and one for external consumers with fewer ones. I'm not saying we should do this; it's just something to keep at the back of one's head. ## The changes There are a number of closely-related changes here -- most of them had to be atomic with this one. A description of the changes and the rationale for them: ### 1. Zone config types These are `OmicronZonesConfig`, `OmicronZoneConfig`, `OmicronZoneType`, and `OmicronZoneDataset`. In this PR, I've moved them from `sled-agent` to `nexus-sled-agent-shared`. * These types are an important part of the sled-agent API, and `nexus-types` previously used the progenitor-generated copy of them. * However that has had some issues for us, e.g. #4988. * Personally speaking, having two copies of such a complex type floating around has been suboptimal for me while I've been writing blueprint code. For example, previously, ctrl-clicking on `OmicronZonesConfig` in vscode always pointed me at the progenitor macro which wasn't helpful. Now, it points at the definition in `nexus-sled-agent-shared`. * There's an extra transitive dependency this introduces, which is why they're in `nexus-sled-agent-shared`. There's one incident that I found along the way that I believe is a **bug fix**. We have rules for DNS server IP addresses: * External DNS servers can be IPv4 or IPv6 * Internal DNS servers are always v6 But the Progenitor-generated copy of `OmicronZonesConfig` uses strings to store address/port pairs. As a result, in some cases we managed to store an IPv4 address for an internal DNS server. With this PR, this case is now detected. (I think it would have been ultimately rejected at some point, but it's possible for the bad DNS server to end up being persisted.) ### 2. `ZoneKind` There were previously two dataless enums describing the kind of Omicron zone floating around. 1. [`ZoneKind` in `sled-agent-client`](https://github.com/oxidecomputer/omicron/blob/26959bfc1a073c91a920bd521da13a980d137e3f/clients/sled-agent-client/src/lib.rs#L87), used by Nexus: exactly `OmicronZoneType` without any associated data. 2. [`ZoneType` in `sled-agent`](https://github.com/oxidecomputer/omicron/blob/26959bfc1a073c91a920bd521da13a980d137e3f/sled-agent/src/params.rs#L265), used by sled-agent -- identifying the substring used within the service (notably with `BoundaryNtp` and `InternalNtp` combined into one). While looking around, I also noticed that there were actually four different string representations in use for Omicron zones: 1. the zone prefix 2. the service prefix 3. a prefix for the `Name` instance 4. a string used in reports and error messages. Here, 1-3 are pretty similar to each other (they combine `InternalNtp` and `BoundaryNtp` down to one, and are generated by sled-agent's `ZoneType`). There is also a `ZoneType` in nexus-db-model that corresponds to `ZoneKind`. Based on this, I decided to combine it all down into one `ZoneKind` enum, with 4 functions returning 4 string representations: one each for 1-4. * There's no `Display` impl -- users must choose the representation they want, which I think is correct. * I've tried to carefully verify that there are **no behavior changes** introduced here. * An alternative would be to have a dedicated representation for 1-3 along with one for 4, but I tried it out and it seemed too unwieldy in practice. (For example, a few places want to use the zone prefix, but then report errors using the report string. Having it be one enum with four representations in string-land makes this easier. I do hope we won't ever need a fifth, though.) ### 3. Physical disk config `OmicronPhysicalDiskConfig` and `OmicronPhysicalDisksConfig` are shared types used by both Nexus and sled-agent. I moved them from `sled-storage` to `omicron-common`, next to `DiskIdentity` (which was already in `omicron-common` and which these types use). * Previously, `nexus-types` used the `sled-agent-client`-generated copy of these types -- but as discussed in the motivation section, that is no longer an option. * Why not `nexus-sled-agent-shared`? Well, `sled-storage` also needs these types, and I wanted to avoid sled-storage depending on the additional dependencies in `nexus-sled-agent-shared`. * I also used `replace` to patch them in so that there's just one copy of these types throughout sled-agent and Nexus. ### 4. Recovery silo types These are two types, `RecoverySiloConfig` and `Certificate`, that are primarily related to Nexus and that `sled-agent` was using via `nexus-client`. Users of these types needed to be moved into sled-agent-types, which (as described in the motivation section) cannot depend on nexus-client. So I've moved them into shared crates. * `Certificate` is a simple type that can go into `omicron-common`'s `api::internal::nexus`. * `RecoverySiloConfig` depends on `omicron-passwords`, so it goes in `nexus-sled-agent-shared`. We also patch these types in `nexus-client`. One **behavior change** as a result is that password hash strength is now validated on **both the client and the server.** I think this is okay, but it's worth mentioning explicitly. ### 5. `UserId` So this is a really interesting one. It needs to be moved into a shared location because `RecoverySiloConfig` depends on it. But during this process I noticed that we weren't doing any validation on it in clients like wicketd. So this counts as a **behavior change**: `UserId` is now validated on **both the client and the server**. Note that `UserId` is exposed by the Nexus external API. But this change itself has no impact on the external API (this can be seen by the fact that `nexus.json` doesn't have any changes in it) (I realized that the name should probably be something closer to `LocalUserId`, but that would be a breaking change to the external API so I've deferred it. Happy to change the name if it makes sense, though.) ### 6. Inventory types These are `Inventory`, `InventoryDisk` and `InventoryZpool`. I've moved them to `nexus-sled-agent-shared` as a replacement for the previous scheme where we were using client-generated versions of them. ### 7. `DiskVariant`, `SledRole`, other misc types These are simple types depended upon by others that have been moved into shared crates -- locations chosen mostly based on which code depends on them. ## OpenAPI changes There are some changes to our OpenAPI specifications. * Most of the changes are to `description` instances now that they're no longer going through Progenitor twice. * There are some validation changes, as documented above. ## Questions - [x] What should the name of the new crate be? - Resolved in 2024-07-23 meeting: **nexus-sled-agent-shared**. - [x] Verify that the behavior changes documented above are all reasonable. ## Other notes It would be nice to have tooling which asserts that e.g. none of the `-types` crates ever depend on `-client` crates. Haven't written this tooling though. Previously I did similar things using [guppy](https://github.com/guppy-rs/guppy), but that's only available as a library and can take a while to build (the total number of build units for `xtask` goes from 254 to 279, which is significant). Turning this into a binary would solve this but requires some work.
These are core API types used by the bootstrap agent. This is almost entirely
pure code movement with no functional changes.
These types will be exposed via the upcoming bootstrap-agent-api crate, and I
don't want to clutter the API crate too much. In the future we could also
batch-replace these with the new progenitor stuff if desired.
Depends on #6089.